df <- read_csv("ESS11.csv")
## Rows: 40156 Columns: 640
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (9): name, proddate, cntry, cntbrthd, lnghom1, lnghom2, fbrncntc, mbr...
## dbl  (616): essround, edition, idno, dweight, pspwght, pweight, anweight, nw...
## dttm  (15): inwds, ainws, ainwe, binwe, cinwe, dinwe, einwe, finwe, hinwe, i...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1 Data preprocessing

glimpse(df)

First of all, I will weight all the variables and decide which one to keep for further detect association rules if the target variable is hincfel, which is the feeling about household’s income nowadays.

There are some direct relevance variables, including employment status, education, and maritial or family context which often shape household income and perceptions of economic comfort. Besides, the subjective indicator such as life satisfaction, happines, trust and satisfaction with the economy can be strong correlates in the financial well-being perception. Additionally, some demographic and social dimension should also be considered, like age, gender, health, household structure.

sub_df <- df %>% select(
    cntry,
    agea,
    gndr,
    eisced,
    maritalb,
    hhmmb,
    mainact,
    stflife,
    happy,
    ppltrst,
    stfeco,
    gincdif,
    health,
    domicil,
    sclmeet,
    vote,
    hincfel)

str(sub_df)
## tibble [40,156 × 17] (S3: tbl_df/tbl/data.frame)
##  $ cntry   : chr [1:40156] "AT" "AT" "AT" "AT" ...
##  $ agea    : num [1:40156] 65 21 53 78 64 59 77 69 52 75 ...
##  $ gndr    : num [1:40156] 1 2 2 2 1 2 2 2 2 2 ...
##  $ eisced  : num [1:40156] 3 5 6 5 3 4 2 7 5 2 ...
##  $ maritalb: num [1:40156] 1 6 1 4 1 1 5 1 1 1 ...
##  $ hhmmb   : num [1:40156] 2 1 3 1 2 2 1 2 3 2 ...
##  $ mainact : num [1:40156] 66 66 66 66 66 66 66 6 66 66 ...
##  $ stflife : num [1:40156] 8 9 10 7 9 8 8 8 8 8 ...
##  $ happy   : num [1:40156] 8 9 9 7 9 8 7 8 8 7 ...
##  $ ppltrst : num [1:40156] 5 10 6 6 6 8 7 8 7 3 ...
##  $ stfeco  : num [1:40156] 6 2 6 4 6 4 6 6 6 7 ...
##  $ gincdif : num [1:40156] 2 1 1 1 2 2 2 2 1 2 ...
##  $ health  : num [1:40156] 3 2 1 3 2 1 2 2 2 2 ...
##  $ domicil : num [1:40156] 3 1 3 1 4 4 3 4 4 4 ...
##  $ sclmeet : num [1:40156] 4 7 4 6 5 6 4 4 4 3 ...
##  $ vote    : num [1:40156] 1 1 1 2 1 1 1 1 1 1 ...
##  $ hincfel : num [1:40156] 1 2 1 2 2 1 2 1 2 2 ...
summary(sub_df)
##     cntry                agea             gndr           eisced      
##  Length:40156       Min.   : 15.00   Min.   :1.000   Min.   : 1.000  
##  Class :character   1st Qu.: 37.00   1st Qu.:1.000   1st Qu.: 3.000  
##  Mode  :character   Median : 53.00   Median :2.000   Median : 4.000  
##                     Mean   : 58.06   Mean   :1.533   Mean   : 4.532  
##                     3rd Qu.: 67.00   3rd Qu.:2.000   3rd Qu.: 6.000  
##                     Max.   :999.00   Max.   :2.000   Max.   :99.000  
##     maritalb          hhmmb           mainact         stflife      
##  Min.   : 1.000   Min.   : 0.000   Min.   : 1.00   Min.   : 0.000  
##  1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.:66.00   1st Qu.: 6.000  
##  Median : 2.000   Median : 2.000   Median :66.00   Median : 8.000  
##  Mean   : 3.702   Mean   : 2.942   Mean   :59.66   Mean   : 7.784  
##  3rd Qu.: 6.000   3rd Qu.: 3.000   3rd Qu.:66.00   3rd Qu.: 9.000  
##  Max.   :99.000   Max.   :99.000   Max.   :99.00   Max.   :99.000  
##      happy           ppltrst           stfeco          gincdif     
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :1.000  
##  1st Qu.: 7.000   1st Qu.: 3.000   1st Qu.: 3.000   1st Qu.:1.000  
##  Median : 8.000   Median : 5.000   Median : 5.000   Median :2.000  
##  Mean   : 7.741   Mean   : 5.361   Mean   : 5.935   Mean   :2.168  
##  3rd Qu.: 9.000   3rd Qu.: 7.000   3rd Qu.: 6.000   3rd Qu.:3.000  
##  Max.   :99.000   Max.   :99.000   Max.   :99.000   Max.   :9.000  
##      health         domicil        sclmeet            vote      
##  Min.   :1.000   Min.   :1.00   Min.   : 1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.00   1st Qu.: 4.000   1st Qu.:1.000  
##  Median :2.000   Median :3.00   Median : 5.000   Median :1.000  
##  Mean   :2.154   Mean   :2.96   Mean   : 4.941   Mean   :1.382  
##  3rd Qu.:3.000   3rd Qu.:4.00   3rd Qu.: 6.000   3rd Qu.:2.000  
##  Max.   :9.000   Max.   :9.00   Max.   :99.000   Max.   :9.000  
##     hincfel     
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :2.000  
##  Mean   :1.921  
##  3rd Qu.:2.000  
##  Max.   :9.000

Check how many countries are there and inspect hincfel column

table(sub_df$cntry)
## 
##   AT   BE   CH   CY   DE   ES   FI   FR   GB   GR   HR   HU   IE   IS   IT   LT 
## 2354 1594 1384  685 2420 1844 1563 1771 1684 2757 1563 2118 2017  842 2865 1365 
##   NL   NO   PL   PT   RS   SE   SI   SK 
## 1695 1337 1442 1373 1563 1230 1248 1442
table(sub_df$hincfel, useNA = "ifany")
## 
##     1     2     3     4     7     8     9 
## 14309 18108  5862  1426   299   133    19

Checking for missing values and empty strings

isna <- colSums(is.na(sub_df))
cat("Missing values by column:\n")
## Missing values by column:
isna
##    cntry     agea     gndr   eisced maritalb    hhmmb  mainact  stflife 
##        0        0        0        0        0        0        0        0 
##    happy  ppltrst   stfeco  gincdif   health  domicil  sclmeet     vote 
##        0        0        0        0        0        0        0        0 
##  hincfel 
##        0

Since the main activity variables has more than 36 thousands missing value, I will remove it from the analysis, also, rows with missing value will be removed.

sub_df <- sub_df %>% select (-mainact)
sub_df <- sub_df %>% filter(!if_any(everything(), is.na))

Rows with refusal, don’t know and no answer will be removed from column hincfel.

For all other columns, refusal answer, don’t know and know answer will be removed as I think it will not bring value to the result.

sub_df <- sub_df %>%
  mutate(
    hincfel = if_else(hincfel %in% c(7,8,9), NA_real_, hincfel),
    agea = if_else(agea  == 999, NA_real_, agea),
    gndr = if_else(agea  == 9, NA_real_, gndr),
    eisced = if_else(eisced %in% c(0, 55, 77,88,99), NA_real_, eisced),
    maritalb = if_else(maritalb  %in% c(77,88,99), NA_real_, maritalb),
    hhmmb = if_else(hhmmb  %in% c(77,88,99), NA_real_, hhmmb),
    stfeco  = if_else(stfeco  %in% c(77,88,99), NA_real_, stfeco),
    happy = if_else(happy  %in% c(77,88,99), NA_real_, happy),
    stflife = if_else(stflife %in% c(77,88,99), NA_real_, stflife),
    ppltrst = if_else(ppltrst %in% c(77,88,99), NA_real_, ppltrst),
    gincdif = if_else(gincdif %in% c(7,8,9), NA_real_, gincdif),
    health = if_else(health %in% c(7,8,9), NA_real_, health),
    domicil = if_else(domicil %in% c(7,8,9), NA_real_, domicil),
    sclmeet = if_else(sclmeet %in% c(77,88,99), NA_real_, sclmeet),
    vote = if_else(vote %in% c(7,8,9), NA_real_, vote))

1.1 Categorize data

sub_df <- sub_df %>%
  mutate(
    agea = case_when(
      agea < 30  ~ "Under 30",
      agea >= 30 & agea < 50 ~ "30 to 50 years old",
      agea >= 50 & agea < 65 ~ "50 to 64 years old",
      agea >= 65  ~ "65 plus"),
    gndr = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female"),
    eisced = case_when(
      eisced == 1 ~ "Primary Education",
      eisced == 2 ~ "Lower Secondary",
      eisced == 3 ~ "Upper Secondary",
      eisced == 4 ~ "Post-Secondary, Non-Tertiary",
      eisced == 5 ~ "Short-cycle Tertiary",
      eisced == 6 ~ "Bachelor's or Equivalent",
      eisced == 7 ~ "Master's or Equivalent",
      eisced == 8 ~ "Doctoral or Equivalent"),
    maritalb = case_when(
      maritalb == 1 ~ "Legally married",
      maritalb == 2 ~ "In a legally registered",
      maritalb == 3 ~ "Legally separated",
      maritalb == 4 ~ "Legally divorced",
      maritalb == 5 ~ "Widowed",
      maritalb == 6 ~ "Never married or in legally registered"),
    hhmmb = case_when(
      hhmmb < 2 ~ "Less then 2 in a family",
      hhmmb >=2 & hhmmb < 3 ~ "2 to 3 family members",
      hhmmb <= 3 ~ "more than 3 family members"),
    stfeco  = case_when(
      stfeco <= 3 ~ "Not satisfied w country ecno",
      stfeco > 3 & hhmmb < 6 ~ "Satisfy w country ecno",
      stfeco >=6 ~ "Very satisfy w country ecno"),
    happy = case_when(
      happy <= 3 ~ "Not happy",
      happy > 3 & happy < 6 ~ "Feeling so-so",
      happy >=6 ~ "Happy"),
    stflife = case_when(
      stflife <= 3 ~ "Not satisfied w life",
      stflife > 3 & stflife < 6 ~ "Life is ok",
      stflife >=6 ~ "Satisfied w life"),
    ppltrst = case_when(
      ppltrst <= 3 ~ "can't be too careful w ppl",
      ppltrst > 3 & ppltrst < 6 ~ "can be trust",
      ppltrst >=6 ~ "can trust most people"),
    gincdif = case_when(
      gincdif < 3 ~ "inc diff shoud reduce",
      gincdif == 3 ~ "inc diff is ok",
      gincdif > 3 ~ "inc diff shoud not reduce"),
    health = case_when(
      health < 3 ~ "good health",
      health == 3 ~ "health ok",
      health > 3 ~ "bad health"),
    domicil = case_when(
      domicil == 1 ~ "big city",
      domicil == 2 ~ "Suburbs/outskirts of big city",
      domicil == 3 ~ "Town/small city",
      domicil == 4 ~ "Country village",
      domicil == 5 ~ "Farm/home in countryside"),
    sclmeet = case_when(
      sclmeet <= 3 ~ "rarely socialize",
      domicil <= 5 ~ "sometimes socialize",
      domicil >5  ~ "often socialize"),
    vote = case_when(
      vote == 1 ~ "voted",
      vote == 2 ~ "did not vote",
      vote == 3 ~ "not eligible to vote"),
    hincfel = case_when(
      hincfel == 1 ~ "Living comfortably on prs income",
      hincfel == 2 ~ "Coping on prs income",
      hincfel == 3 ~ "Difficult on prs income",
      hincfel == 4 ~ "Very difficult on prs income"))
sub_df <- na.omit(sub_df)

Convert data to transactions for association rules

df_items <- lapply(seq_len(nrow(sub_df)), function(i) {
  row_i <- sub_df[i, ]
  paste0(names(sub_df), "=", row_i)})


library(arules)
trans <- as(df_items, "transactions")
summary(trans)
## transactions as itemMatrix in sparse format with
##  24141 rows (elements/itemsets/transactions) and
##  78 columns (items) and a density of 0.2051282 
## 
## most frequent items:
##                   happy=Happy      stflife=Satisfied w life 
##                         20679                         19495 
##       sclmeet=often socialize                    vote=voted 
##                         19010                         18791 
## gincdif=inc diff shoud reduce                       (Other) 
##                         18116                        290165 
## 
## element (itemset/transaction) length distribution:
## sizes
##    16 
## 24141 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      16      16      16      16      16      16 
## 
## includes extended item information - examples:
##                    labels
## 1 agea=30 to 50 years old
## 2 agea=50 to 64 years old
## 3            agea=65 plus
labels_old <- itemLabels(trans)
labels_new <- sub(".*=", "", labels_old)
itemLabels(trans) <- labels_new

2 Association rules mining

freq <- itemFrequency(trans)
freq_sorted <- sort(freq, decreasing=TRUE)
head(freq_sorted, 10)
##                 Happy      Satisfied w life       often socialize 
##             0.8565925             0.8075473             0.7874570 
##                 voted inc diff shoud reduce           good health 
##             0.7783853             0.7504246             0.6498902 
## 2 to 3 family members                Female       Legally married 
##             0.5655938             0.5228035             0.5084711 
##                  Male 
##             0.4771965
itemFrequencyPlot(trans, topN=20, type="absolute", main="Top frequent items")

From the plot, it seem like many people are happy and satisfied with their life as they are the most frequently occurring items, suggesting a significant portion of the population reports a positive outlook or well being.

Also, lots of people would like to socialize often, and many of them did vote at the last national election.

rules <- apriori(trans, parameter = list(supp = 0.01, conf = 0.5, minlen = 2), appearance = list(default = "lhs", rhs = c("Living comfortably on prs income", "Coping on prs income", "Difficult on prs income","Very difficult on prs income")), control=list(verbose=F))
rules_sorted <- sort(rules, by = "lift", decreasing = TRUE)
inspect(head(rules_sorted))
##     lhs                          rhs                                   support confidence   coverage     lift count
## [1] {Master's or Equivalent,                                                                                       
##      Male,                                                                                                         
##      Happy,                                                                                                        
##      good health,                                                                                                  
##      Legally married,                                                                                              
##      can trust most people,                                                                                        
##      often socialize,                                                                                              
##      Satisfied w life,                                                                                             
##      voted}                   => {Living comfortably on prs income} 0.01226130  0.8629738 0.01420819 2.420477   296
## [2] {Master's or Equivalent,                                                                                       
##      Male,                                                                                                         
##      Happy,                                                                                                        
##      good health,                                                                                                  
##      2 to 3 family members,                                                                                        
##      Legally married,                                                                                              
##      can trust most people,                                                                                        
##      often socialize,                                                                                              
##      Satisfied w life}        => {Living comfortably on prs income} 0.01035583  0.8620690 0.01201276 2.417940   250
## [3] {Master's or Equivalent,                                                                                       
##      Male,                                                                                                         
##      Happy,                                                                                                        
##      good health,                                                                                                  
##      Legally married,                                                                                              
##      can trust most people,                                                                                        
##      often socialize,                                                                                              
##      voted}                   => {Living comfortably on prs income} 0.01234414  0.8612717 0.01433246 2.415703   298
## [4] {Master's or Equivalent,                                                                                       
##      Male,                                                                                                         
##      Happy,                                                                                                        
##      good health,                                                                                                  
##      2 to 3 family members,                                                                                        
##      Legally married,                                                                                              
##      can trust most people,                                                                                        
##      often socialize}         => {Living comfortably on prs income} 0.01052152  0.8610169 0.01221987 2.414989   254
## [5] {Master's or Equivalent,                                                                                       
##      Male,                                                                                                         
##      good health,                                                                                                  
##      2 to 3 family members,                                                                                        
##      Legally married,                                                                                              
##      can trust most people,                                                                                        
##      often socialize,                                                                                              
##      Satisfied w life}        => {Living comfortably on prs income} 0.01039725  0.8595890 0.01209560 2.410984   251
## [6] {Master's or Equivalent,                                                                                       
##      Male,                                                                                                         
##      good health,                                                                                                  
##      2 to 3 family members,                                                                                        
##      Legally married,                                                                                              
##      can trust most people,                                                                                        
##      often socialize}         => {Living comfortably on prs income} 0.01064579  0.8595318 0.01238557 2.410823   257
plot(rules, method = "grouped")

There are total of 38673 rules were mined by using Apriori algorithmm that has the length from 2 to 10 items. The highest number of rules (10,016) have a length of 6 items, followed by rules with lengths 5 (6,549) and 7 (9,489) that indicates that the data supports rules with multiple antecedents and consequences.

Since there are many rules that have low support range, meaning they occur only in a few transactions, so I will clean the rules by remove the redundant one.

strong_rules <- rules[!is.redundant(rules)]
strong_rules <- strong_rules[(is.maximal(strong_rules))]
summary(strong_rules)
## set of 2381 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5   6   7   8   9  10 
##  23 200 541 716 534 228  92  47 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   6.186   7.000  10.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift      
##  Min.   :0.01002   Min.   :0.5000   Min.   :0.01189   Min.   :1.101  
##  1st Qu.:0.01040   1st Qu.:0.5260   1st Qu.:0.01806   1st Qu.:1.165  
##  Median :0.01106   Median :0.5517   Median :0.02005   Median :1.227  
##  Mean   :0.01179   Mean   :0.5797   Mean   :0.02070   Mean   :1.407  
##  3rd Qu.:0.01218   3rd Qu.:0.6031   3rd Qu.:0.02220   3rd Qu.:1.629  
##  Max.   :0.03086   Max.   :0.8630   Max.   :0.05712   Max.   :2.420  
##      count      
##  Min.   :242.0  
##  1st Qu.:251.0  
##  Median :267.0  
##  Mean   :284.7  
##  3rd Qu.:294.0  
##  Max.   :745.0  
## 
## mining info:
##   data ntransactions support confidence
##  trans         24141    0.01        0.5
##                                                                                                                                                                                                                                                                           call
##  apriori(data = trans, parameter = list(supp = 0.01, conf = 0.5, minlen = 2), appearance = list(default = "lhs", rhs = c("Living comfortably on prs income", "Coping on prs income", "Difficult on prs income", "Very difficult on prs income")), control = list(verbose = F))
plot(strong_rules, method = "grouped")

After reduce the redundant rules, I have 2381 rules in total. In the next step, I will inspect the rule for different categories in feeling about household’s income nowadays.

2.1 Living comfortably on present income

comfy_rules <- subset(strong_rules, rhs %in% "Living comfortably on prs income")
inspectDT(comfy_rules )
plot(comfy_rules, method = "graph", control = list(type = "items"))
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

plot(comfy_rules,
     method = "scatterplot",
     measure = "support",
     shading = "lift",
     engine = "plotly")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

A prominent factor, indicating that individuals who report good health are more likely to feel comfortable with their income. It might reflects the financial stability associated with fewer healthcare-related expenses.

Suprisingly, male appears to correlate strongly with the feeling of living comfortably on present income which can reflect gender income disparities or societal norms influencing financial perception.

Those who are legally married associciated with better financial comfort due to the possibility of shared financial resources with their partner.

Besides, family with 2 to 3 members are strongly linked to financial comfort, maybe due to the reduce financial strain. Individuals from 50 to 64 years old often experience peak earning years or financial stability, that contribute to higher comfort level.

Regarding geographical factor, the Netherland and Sweden feature prominently, reflecting higher comfort levels in countries with strong welfare systems and economic stability.

2.2 Coping on the present income

coping_rules <- subset(strong_rules, rhs %in% "Coping on prs income")
inspectDT(coping_rules)
plot(coping_rules, method = "graph", control = list(type = "items"), main = "Obesity rules ")
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

library(arulesViz)

plot(coping_rules, method = "graph", engine = "htmlwidget")

“Coping” often indicates a mid-range situation that is not truly comfortable, but not severely struggling with the present income. So items like “more than 3 family members” or “2 to 3 family members” might reflect a moderate household size—big enough to cause some financial pressure but not extreme.

“Rarely socialize” can be correlated with sense of control or financial constraints. It might reflect less disposable income for leisure or less inclination to go out—so one might interpret that as an indicator of moderate financial caution.

People are satisfy with country economy but still “Coping” might mean that personal finances are borderline, though they consider the overall economy is not too bad.

People in Lithuania, Hungary, Poland and Finland appear most frequent in this category suggests these countries have a considerable share of respondents who find themselves neither fully comfortable nor severely struggling with their income. While Finland has a well-developed welfare state, other factors like personal debt or higher living costs could constrain how “comfortable” individuals feel.

2.3 Difficult on present income

There is no rules found for people that feel difficult on the present income

2.4 Very difficult on present income

There is no rules found for people that feel very difficult on the present income

3 Conclusion

From both groups who feel comfortable with the current income and those who coping with it, many of them state that the government should reduce the income differences which can better enhance financial well-being of household.

The lack of discernible rules for difficult or very difficult financial situations may indicate the sample sizes for those categories or the respondents in severe financial straits share fewer common traits captured by the variables in this dataset.

Overall, the association rules underscore how health, demographic such as gender, marital status, age, and national context all interplay to shape subjective financial comfort.